Psych 560 Principles of Psychological
Measurement
Professor Neil H. Schwartz, Ph.D.
Guidelines
for Reviewing a Test
Introduction
Reviewing a professionally published instrument is an excellent way to make
the concepts of measurement come alive. Concepts of reliability, standard error
of measurement, norms, validity, etc. begin to take on real meaning when
evaluated in the context of a real psychological or educational measuring
device. As such, you will have the opportunity to review a test, and write up
your review in a 6-10 paper due Monday,
May 23rd, 2005 .
General Approach to the Review
When reviewing a test, it is best, as Salvia & Ysseldyke
(2002) suggest, to adopt a "show me"
attitude. That is, do not expect test authors to speak candidly about the
shortcomings of their instrument. Test authors and test publishers are in
business. Thus, while some are more conservative and cautious than others, at
the end of the day, they are all still in the business of "selling"
their product. Thus, it will be up to you to evaluate
the adequacy of a test in terms of its strength as a measuring device, and
determine the purpose for which, and the constituency on which, the test can be
successfully used.
Also, be prepared to search for information. Not all test manuals are neatly
organizaed and arranged, rendering pertinent
information difficult to find. Thus, consider yourself more of an investigator
searching for "truth", rather than just a reporter describing what
the manual tells you. Look for inconsistencies of information between text and
tables of information in the manual. Be particularly aware of omissions of
information. For example, test-retest reliabilities are often reported, but the
test-retest interval of time is not. Thus, an instrument may have a high
reliability value obtained from a test-retest of 24 to 48 hours.
Finally, when reading the text of the manual, pay particular attention to
inconsistencies between what the authors say about the instrument, and what the
data support. Often, authors will state information in the text of the manual that
is not supported by empirical data. In short, read with a keen eye.
Materials You Will Need to Acquire
The materials you will need are:
- The test manual (essential)
- Test materials-- that is,
stimulus materials that are shown to the examinee.
- Examinee record booklets,
often called protocols. These are sheets where an examiner actually does
his/her scoring of examinee performance.
- Supplemental test documents.
Sometimes tests-- the PPVT-R, for example-- have what is called a
technical supplement. There you will often find pages devoted to data.
Many tests do not have this supplement, but if a test does have one, it is
essential to obtain it, and evaluate the instrument using the information
contained therein.
Components of the Review
In your review, plan on writing a critique of the instrument in terms of the
following:
- Summarize what the
test manual has to say about the test. Is it accurate? Is it overstated?
Is it supported by data? What is the writing style of the manual? Is it
pedantic, condescending, or difficult to decipher?
- Describe the norming population. Consider the representativeness
of the norms; the size of the sample; the proportionate distribution of
the normative characterisitics and elements.
- Discuss the kind of
scores provided by the instrument. Are they linearly-transformed, or are
the scores non-linear transformations? Is there sufficient cautionary
discussion of the use of nonlinear transformations? Does the manual descibe the method used to derive age-equivalent
and/or grade-equivalent scores?
- Discuss the accuracy
of the instrument in terms of the reliability data. What kind of
reliability was calculated? What limitations does each type impose? How
high were the reliabilities? Who were examines on which the reliability
data was taken? How many examinees were in the samples? Is the test
equally reliable for each age cohort of the instrument? Is the standard
error of measurement of the instrument uniform across age cohorts? What
type of reliablity was used to establish the SEM's? Do you think the type of reliability estimates
used to establish the SEM's are justified?
- Discuss the validity
of the instrument. What kind of validity information is available in the
test manual? How was it derived? Is there sufficient discriminant
validity to establish what the test does and does not measure? How strong
are the validity coefficients? Does the manual's discussion of validity
match the data it provides? How old are the validity studies? What were
the criteria emplyed to establish the
criterion-related validities?
- Finally, your review
should conclude with a short paragraph on the adequacy of the instrument,
based on your review, and the extent to which the test can, and should
be, used.